DOCUMENT RESUME 



ED 053 772 



LI 003 044 



TITLE 

INSTITUTION 
SPONS AGENCY 



REPORT NO 
PUB DATE 
NOTE 



Project Intrex. Semiannual Activity Report, 15 March 
1971 - September 1971. 

Massachusetts Inst, of Tech. , Cambridge. 

Carnegie Corp. of New York, N.Y. ; Council on Library 
Resources, Inc., Washington, D.C.; National Science 
Foundation, Washington, D.C. ; Office of Education 
(DHEW) , Washington, D.C. 

Intrex-PR- 1 2 
15 Sep 71 

124p.; (40 References) 



EDRS PRICE EDRS Price MF-S0.65 HC-S6.58 

DESCRIPTORS *Computer programs. Computers, Digital Computers, 

Economic Research, *Electronic Data Processing, 
Information Networks, ♦Information Retrieval, 
♦Information Storage, ♦Library Automation, Library 
Services, Magnetic Tapes, Models, University 
Libraries, Use Studies 

IDENTIFIERS Computer Software, Library Role, *Project Intrex 



ABSTRACT 

Libraries should resist the temptation to relinquish 
to computing centers the burden of looking after the university’s 
digital data record resources for a growing volume of intellectually 
important material will arrive at the university on digital data 
tapes. If the user must arrange for access to this material outside 
the library, he will be seriously disadvantaged. When a library 
assumes responsibility for digital data record resources, the 
selection of access techniques becomes the central question. The use 
of interactive techniques in which the user is in direct 
communication with the data file, combined with full-text displays is 
described. The model Library Program, discussed in Section III, deals 
f with procedures that assist the user who seeks information in a mixed 
regime of machine access techniques and conventional library 
;; operations. The program’s objective is to examine system 
j configurations from the viewpoint of cost-benefits relationships and 
| to study interrelationships among factors such as data-base size, 
content and cost, user population, equipment utilization, hardware 
I considerations, and networking through use of electrical 
; communications. (Other reports in this series are available as: ED 
036 299, 036 301, 043 348, and 047 739). (Author/NH) 



O 

ERLC 



ED053772 



U.S. DEPARTMENT OF HEALTH, 
EDUCATION. & WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY 



MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
PROJECT INTREX 



SEMIANNUAL ACTIVITY REPORT 
15 March 1971 - 15 September 1971 



Intrex PR-12 
15 September 1971 






CO 

o 

o 



h-3 



CAMBRIDGE 



MASSACHUSETTS 



1 



ACKNOWLE DGMENTS 



The research reported in this document was made possible through 
the support extended the Massachusetts Institute of Technology, 
Project Intrex, under grants from the Carnegie Corporation, the 
Council on Library Resources, Inc. , the National Science Foun- 
dation, and the U.S. Office of Education. 



o 

ERIC 



2 



TABLE OF CONTENTS 



INTRODUCTION 

RESEARCH AND DEVELOPMENT ACTIVITIES 
(Electronic Systems Ls.bor3.tory) 

A. STATUS OF THE PROGRAM 

B. SYSTEM USAGE: EXPERIMENTS AND ANALYSIS 

C. ECONOMIC ANALYSIS 

D. AUGMENTED-CATALOG INPUTTING 

E. COMPUTER SOFTWARE 

F. HARDWARE 

MODEL LIBRARY PROJECT 

A. STATUS OF THE PROJECT 

B. POINT-OF-USE INSTRUCTION 

C. PATHFINDERS 

D. USER PREFERENCE STUDY 

E. VISITORS PROGRAM 

PROJECT INTREX STAFF 



page 



CURRENT PUBLICATIONS 

PAST PUBLICATIONS October, 1969 through 15 March 1971 



PROJECT INTREX 



Activity Report 



I. INTRODUCTION 

A reel of magnetic tape with digitally encoded data has none of the esthetic and 
symbolic appeal of a book. It is viewed with misgivings by the growing segment of our 
society which is turning away from science and technology and regards computers as 
dangerous, irritating and unnecessary afflictions. Nevertheless, digital data tapes are 
becoming important resources for the work of scholars at our universities, and libraries 
will have to face the question whether they should acquire, catalog, store, and make 
accessible these n^w materials. A conspicuous illustration of the impact of this question 
is the action of the Center for Research Libraries in sponsoring the conversion of 1970 
U. S. Census tapes into a format more suitable for academic use. 

The question arises at a time when the computing budgets of many universities 
have been expanding at a higher rate than library budgets. The sharpening competition 
for i : ve,ry dollar in a university’s general funds does not strengthen the bonds of friend- 
ship between computing centers and libraries. On the contrary, this competition has 
given a new monetary twist to the otherwise nonsensical notion that the computing center 
is becoming a threat to the central role of the library in the intellectual life of the 
univers ity. 

In this situation, libraries will be tempted to relinquish to computing centers 
the burden of looking after the university’s digital data record resources. It is to be 
hoped that they will resist this temptation, even if it involves further inroads on other 
services. For there can be no doubt that a growing volume of intellectually important 
material will arrive at the university on digital data tapes. If the user must arrange 
for his access to such material outside the library, he will be s erious ly dis advantaged . 
The effective use of digital data tapes will, in general, involve the concurrent use of 
other recorded information. Bibliographic data tapes are an outstanding example; they 
must be used in an environment that contains printed bibliographic resources. If the 
library rejects the new materials, they will ultimately be made available in a new 
facility in which some of the most active elements of the traditional library are dupli- 
cated. Even then, the user will not have at his command the full range of resources 
that he may need. The acceptance by the library of a gradually shrinking antiquarian 
role would surely not be in the best interests of the academic community. 

Once a library has decided, as a matter of long-term policy, to assume 
responsibility for the digital data record resources of the university, the selection of 
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access techniques will become a central and continuing task. Beginning with batch- 
processing services, the library will wish to advance to the more effective interactive 
techniques in which the user is in direct communication with the data file. 

The use of such interactive techniques, combined with full-text displays, is 
described in the present report and its predecessors in this series. Experimental 
facilities for such use have been provided in the Barker Engineering Library at M.I.T. 
The Model Library Program, discussed in Section III, deals with procedures that 
assist the user who seeks information in a mixed regime of machine access techniques 
and conventional library operations. A number of distinguished librarians have visited 
this project during the spring and summer of 1971, and we have derived great benefit 
from their reactions. We hope that these exchanges will continue in the new academic 
year . 



Carl F. J. Overhage 
Cambridge, Massachusetts 
15 September 1971 



II. RESEARCH AND DEVELOPMENT ACTIVITIES (Electronic Systems Laboratory) 

A. STATUS OF THE PROGRAM 



Professor J.F. Reintjes 

During the past six months, the Intrex system has been available for use 
on an open- environment basis at the Barker Engineering Library and the Materials 
Science and Engineering Center, M.I.T., and at the McKay Laboratories, Harvard 
University. In addition, other terminals at the M.I.T. Electronic Systems Labo- 
ratory have been used in the conduct of controlled experiments. The several 
hundred users of the system during this period have provided us with an excellent 
opportunity to observe reactions to a machine-oriented document storage and 
retrieval system and to measure the effectiveness of the Intrex configuration as an 
information- retrieval mechanism. 

Several patterns are becoming evident as we gain experience with Intrex. 
Our principal users, undergraduate and graduate students, are enthusiastic about 
the on-line interactive mode of operation. Ability to negotiate for information in 
real time directly with the computer at interactive terminals is highly exciting to 
all who have come to the system. We also observe that our on-line, step-by-step 
instructional aids are making it possible for even novice users to begin reception 
of useful information within a very few minutes after they first engage the system. 
Users are pleased with their ability to n get going ' 1 quickly. 

The availability of full text at the cathode- ray- tube terminals is being 
singled out for favorable comment. Having look-up and full- text- scanning capabil- 
ities at the same location is being judged highly valuable from the viewpoints of 
convenience, time-saving, and assurance of document us efulnes s . 

Our experiences with the open environment during the spring academic 
term led us to seek ways to quantify more precisely observations on matters such 
as instructional aids, search strategies, and user reactions to the over-all machine 
environment. Interviews, questionnaires and combinations of these were evaluated 
during the summer months in preparation for formal use during the fall. 

We are intensifying our program aimed at the design of efficient document 
storage and retrieval systems through use of economic analyses and modeling 
techniques. Our objective is to examine system configurations from the viewpoint 
of cost-benefits relationships and to study interrelationships among factors such 
as data-base size, content and cost, user population, equipment utilization, hard- 
ware considerations, and networking through use of electrical communications. 



A new series of controlled experiments which seeks to establish the indica- 
tivity of various fields of catalog information has been inaugurated. Indicativity 
is defined as the relative usefulness of a specific field of information (title, 
abstract, subject-index terms, etc.) as an identifier of document value to a user, 
measured against his evaluation of the full text as an identifier. Certain modifica- 
tions in experimental procedure have been instituted in the new series in an effort 
to minimize experimental error. These experiments are discussed in Section II-B, 

Analysis of results obtained from controlled experiments from the view- 
point of retrieval effectiveness of the Intrex System also continued during the 
reporting period. Factors being examined are those influenced by the free- 
vocabulary, in-depth indexing method being employed by Intrex, the organization 
and configuration of the computer software and the format into which the raw 
catalog information is structured. These matters are also presented in detail in 
Section II-B, 

Selection of new documents for our experimental data base continues at 
the rate of approximately 3, 000 documents per year. 

With respect to equipment, we have determined the modifications required 
to operate the M, I, T, -developed Intrex terminal in the Barker Engineering Library, 
a distance of eighteen-hundred feet from the buffer computer. Since information 
is transmitted as pulse trains of a few hundred nanoseconds duration, pulse distor- 
tion caused by coaxial-cable characteristics becomes significant and sufficient to 
degrade performance. Modifications required to ensure reliable transmission 
under remote-terminal conditions have been made and the terminal will be in 
operation at the at the Engineering Library in early September. 
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B. SYSTEM USAGE: EXPERIMENTS AND ANALYSIS 

Staff Members 

Mr. A. R. Benenfeld 
Mrs. S. F. Brown 
Miss M. A. Jackson 
Mr. P. Kugel 
Miss L. T. Lee 
Mr. R.S. Marcus 
Miss V. A. Miethe 

SUMMARY 

Greatly increased use of Intrex facilities in the open environments indicated 
widespread acceptance of the Intrex concepts of online interactive retrieval, guaran- 
teed access to full text, the retrieval software system and user aids. A program of 
controlled data collection on, and experiments with, users in the open environment 
has been initiated. The new series of catalog indicativity experiments has been run 
for nine subjects. Adviser training has been continued and improved. New instruc- 
tional aids have been developed and old ones improved. Additional statistics on the 
inverted files have been gathered. The Class Experiment conducted during the past 
fall term has received further analysis in the areas of retrieval rates for Intrex, 
library, and informal searching, and retrieval effectiveness as influenced by type of 
indexing and search strategies. 

INTREX FACILITIES IN OPEN ENVIRONMENTS 

User Stations . Intrex now maintains three library stations in the M.I.T. 
and Harvard communities. The main station, located in the Barker Engineering 
Library at M.I.T. , provides the full spectrum of system capabilities through use of 
a combined ARDS catalog/text -acces s terminal, a DATEL 30 typewriter catalog ter- 
minal and a fiche collection with facilities for reproducing hard copy (paper and/or 
microfiche) of full text. 

A second M.I.T. station was opened this spring. This station, located in 
the Bush Building, brings the Intrex facilities to staff and students of our primary 
user community. The Bush station contains a combined ARDS terminal that permits 
both text and catalog access and plans are under way to move the film terminal to 
this station. 

A third station on the Harvard campus is located in the McKay Laboratory 
which is the primary center for research in materials sciences at Harvard. This 
station, which was opened during the winter of 1971, contains an IBM 2741 typewriter 
terminal and provides access to the catalog system only. 

- 5 - 
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Operating Experience. As the spring term progressed more and more usage 
was made of the two Barker terminals until there was an average of about 15 usages per 
day at these terminals. This amount of use is close to a saturation level at which the 
terminals are in almost constant use; indeed, we often experienced a queue of potential 
users waiting to take their turn. Since a majority of the clientele was making serious 
use of the system, and a sizable portion (recently over one -half ) represented repeat 
users, this heavy usage ifi taken as clear evidence of the acceptance of the Intrex system 
by the M.I.T, community. 

We postulate several reasons for the increased usage of the system at this 
time. The re-introduction of the advisers on a full-time basis has clearly been an 
important help to users and has spurred use of the system. Also, other instructional 
aids have been improved, as described below. Furthermore, increased publicity both 
from the Engineering Library and by word of mouth has increased the awareness of 
Intrex among the M.I.T. community. It should be noted that user word-of-mouth 
communications would have a positive effect on the majority of our users, who are 
serious users, only if the users were being well served by the system. Finally, the 
system itself has been improved in a number of respects: program and hardware im- 
provements have added important capabilities as mentioned in this and previous reports; 
the system has become much more reliable; and the dat~ base has been growing 
ste adily . 

During the spring academic term, the Intrex terminals at the Barker Engi- 
neering Library were operated on a regular, five -hour -a -day five -day-week schedule. 
During the Spring term, from February 1, 197 1 thru May 3 1 , 197 1 a total of 1,005 
sessions, involving a total of approximately 450 separate users, was recorded for an 
average of 12 sessions per working day. 

Of those 1, 005 sessions, slightly over half, 562, were by users who appear 
from the monitor records of their transactions to have been primarily interested in the 
material the system contains, whereas 443 were by users whose primary aim appears 
to have been to learn about the system. Thus, on the average, 26 sessions a week 
were initiated by serious users and 21 by what we might call "curious” users. 

On January 25, 1971 we began to keep a directory of individual users of the 
system. This directory identifies each person by his status and affiliation and sum- 
marizes his use of the system. From this starting date and running through 31 May 
1971, the Intrex system had 453 distinct users of whom 294 (65 percent) used the sys- 
tem only once, 87 (19 percent) used the system twice, 34 (8 percent) used it three times 
and 38 (8 percent) used it more than three times. The largest number of sessions by a 
repeat user was 15; these engagements were at the Harvard terminal. 
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Of the users during this period, 317 (70 percent) used the ARDS, whereas 
218 (48 percent) used the DATEL or other typewriter consoles. The breakdown by 
academic levels, which showed approximately identical numbers of graduate and under- 
graduate students — a proportionality reflective of the M.I.T. student population, is 
given in the following table: 

Academic Level Number of Users 

Undergraduates 124 

Graduates 127 

Faculty 10 

Staff 8 

Visitors 62 

Unknown 98 

Harvard (all levels) 24 

By department, the largest number of users (among those whose department could be 
identified) came from the Electrical Engineering Department, the largest department 
at M. I. T. The breakdown by departments is given in the following table: 



Department 



No. of Identified Users 



Electrical Engineering 64 

Metallurgy and 

Materials Science 32 

Mechanical Engineering 30 

Physics 16 

Aeronautics and Astronautics 14 

Naval Architecture 14 

Civil Engineering 11 

Chemistry 9 

Chemical Engineering 5 

Mathematics 5 

Nuclear Engineering 4 

Earth and Planetary Science 2 

Humanities 2 

Management 2 

Urban Studies 2 

Architecture 1 

Biology 1 



A majority of the computer time used for these sessions (a total of 51. 22 
hours or 3 minutes per session) was taken up by users of the combined ARDS console 
(31. 47 hours) and the remainder on the DATEL typewriter console (19. 75 hours). 
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Atypical ARDS user had a real-time to computer -time ratio of about 10; the corre - 
sponding figure for the DATEL was about 15, the difference being caused by the slower 
output rate of the typewriter. 

Although a number of operating problems were encountered, in general the 
system was "up" more than 90 percent of the scheduled time. The primary difficulties 
accounting for down time were hardware problems with the text-access and the time- 
shared computer system, and a logistics problem of ensuring that the terminals would 
always be turned on at the regularly scheduled times. 

During the period 1 February through 31 May, a total of 112 user requests were 
made for either fiche or hard (paper) copies of Intrex documents at the Microreproduc- 
tion Facility in the Engineering Library. These included requests for 382 fiche copies 
of documents in the collection. These numbers average 6. 6 users of the Facility per 
week, 22. 4 fiche requests per week and 8. 0 paper -copy requests per week. The 
rather striking user preference for fiche over paper is probably largely attributable 
to the fact that fiche copies are provided at no charge whereas paper copies cost 10 
cents per page. However, some part of this preference is undoubtedly due to the exten- 
sive microfiche viewing facilities available, including the ESL arm-chair reader de- 
scribed in Section F. 

During the summer, scheduled operations were cut back to a twc-hour 
period (1:00 to 3:00 p.m.), five days a week for several reasons including the desire 
to alleviate staffing problems arising from summer vacations and the need to increase 
staff availability for controlled experiments, and because it was thought that the de- 
mand for system usage would decline during the summer months. Somewhat suprisingly, 
the demand for system use continues at a fairly high level despite the summer lull in 
academic activities. To accommodate demands, consoles in the Material Center are 
being made available for two hours (3:00 to 5:00 p.m. ) by appointment. 

The use of the Harvard terminal showed an initial high usage rate — 40 
sessions the first 20 days — -but has since stabilized to a few users a week, at most. 
Recent Harvard usages seem to fall into a rather homogeneous category: a few users 

with large numbers of relatively short usages. While we have not as yet done suffi- 
cient analysis of these Harvard usages to form conclusive judgments, it appears that 
this group makes frequent, casual use of the system. The reasons for the relatively 
low usage at Harvard compared to the Barker Library are conjectural; some possibil- 
ities would seem to be lack of a text-access facility, no advisers, typewriter -only 
terminal, and little publicity. 

User Experience. The two primary sources of information on user opinions 
of the system are spontaneous user comments and solicited responses to question- 
naires. Spontaneous comments are made either orally to the advisers or in writing. 
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The latter may be entered directly into the computer online or written in. notebooks 
kept for this purpose. 

Intrex advisers report that the comments made to them tend, on the whole, 
to be enthusiastic. During the spring term users entered a total of 84 written com- 
ments in the comment books placed next to the two terminals in the Engineering Library. 
These comments can be categorized as follows: (Several comments fit into more than 

one category) : 

* 53 (63 percent) comments are favorable toward the system, usually 
to a high degree 

* 39 (46 percent) comments note that the system is too limited either 
because it does not cover enough material or is not available for a 
long enough time each day. 

These comments can be construed as favorable to the system concept in the 
sense that they imply a desire to use the system . 

29 (3 5 percent) comments point out deficiencies in system per- 
formance or specific system features. 

* 6 (7 percent) make specific suggestions for system improvement. 

We note that many of the users who made suggestions for improvements or 
criticized certain features also made favorable comments. Especially noteworthy is 
the fact that no user objected to the system concept. Collectively, these comments 
suggest an overwhelmingly favorable reaction to the online retrieval concept. 

Those who wrote favorably of the system did so in unmistakable terms. 

Examples : 

"I was really impressed. Found 27 potential articles. M 

"The Intrex system is a great contribution to the library system. 

I hope the entire library will some day be operated in this way. n 

"Remarkable system. It makes searching many times faster and n J 
(n factorial) times more interesting. " 

"I wish we had an Intrex Terminal. It is extremely helpful. M 
(A user from Boston University) 

Deficiencies, when noted, were principally in the categories of hardware 
problems and reliability. Thus: 

"Text almost illegible " 

(Comments like this one and the next are often correlated with faulty adjust- 
ment of focus or brightness of the cathode -ray tube. ) 

"Visual screen blurry. Too small for effective blowup of articles." 
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The system is great (when it works . . ) 



M The on/off switch for the console is in a very poor location. 

It is easy to bump accidentally. 11 

,r The spacing lever gets stuck. M 

Comments on shortness of operating time and limited data base are sum- 
marized by these seven: 

n I have come at various times to do thesis -related work only to 
find the machine taken up . . . Five hours a day is too little. M 

"Why is Intrex up for so little time?" 

"Please extend your system to more fields as quickly as possible 
(e. g. , astrophysics). " 

"How about more stuff on tunneling in superconductors. " 

"Hopefully, Intrex will soon include E. E. in its catalog of informa- 
tion. It is a very useful system and I would like to see it expanded. " 

"It is frustrating to have a system like this only in an experimental 
stage. Hopefully, the data base will be greatly expanded within 
the next year (while I am still around) ... In other words, a 
working system. Your experimental stage is successful." 

In a two -week period (May 10 through May 21) a questionnaire, asking 
opinions of system features, was distributed to the users of the Intrex Consoles at the 
Engineering Library. A total of 45 users filled in the questionnaire including 18 under 
graduates, 21 graduate students, 4 faculty or staff members, and 2 visitors. The 
results of this questionnaire indicate the following: 

* About 86 percent of the users preferred Intrex searching to standard 
library searching and about 65 percent greatly preferred it. 

* More than half (56 percent) were repeat users, suggesting that the 
novelty effect is no longer responsible for most of our usage. 

* A large majority of the users (76 percent) come to the system to 
do a subject search. 

* More than half (64 percent) found some useful documents, and of 

those who did not, 33 percent did not originally intend to do 
serious searching they came just to try out the system. 

* While the average user found the printed instructional material 
at least adequate, he found the personal adviser much more 
helpful. 



Experimental Program. A series of programs to make more effective use 
of the open environment at the Engineering Library for system evaluation is currently 
being developed. These programs are designed to focus on specific problems in system 
evaluation and to obtain quantitative data to develop precise system specifications. 

Three such programs are currently in various stages of development. Each 
program is planned as a separate program for intensive application at the beginning of 
the Fall term; but some consideration is being given to running at least two of them in 
parallel since much of the data to support them are related. The three areas to be 
focussed on first for intensive study are: 

The evaluation of the user aids provided to help 
users learn about the system. 

The effect that the ready availability of full text 
has on user search strategies. 

The retrieval effectiveness of the system and of 
alternative search strategies. 

A program of user experiments to evaluate the effectiveness of user aids has 
been developed and six users had been run through a preliminary version of this pro- 
gram by the end of July. User background, and experience with user aids, are deter- 
mined by carefully questioning the user prior to the session. His use of the system is 
then carefully monitored not only by the usual procedures of computer monitoring and 
the advisor's observations but also by an observer who records the behavior of the user, 
the adviser, and the system, carefully noting those features that would be missed by the 
other monitoring methods. 

A post -session questionnaire is then used to obtain the user's opinions of the 
various aids and to attempt to resolve features of his behavior that are not apparent 
from observations. These data are then reduced to provide quantitative measures to 
determine what aids, and what features of those aids are most helpful to the user. 

A second program is under development to attempt to determine the utility 
of the immediate availability of full text and to determine its effect on user search 
strategies. This program will also be based on the use of pre- and post-session 
questionnaires and careful monitoring and post-session analysis of user behavior. We 
find that careful questioning of the user and thorough analysis of monitor data is re- 
quired to obtain a full and correct understanding of the interactive process. 

We expect that all three of these experimental programs will be producing a 
steady stream of increasingly reliable data during the fall term. 
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INTREX LIBRARIAN -TRAINING PROGRAM 



A restructured program to train two additional library staff members ^as 
advisers to users on the Intrex retrieval system was offered during the spring. The 
restructured training program resulted from a review of the program which was run 
during the winter, 1971 and which was described in detail in the Semiannual Activity 
Report of 15 March 1971. A review of that program with its participants indicated a 
further need to; (1) decrease the time required to run the program; (2) place even 
greater emphasis on practice console sessions; (3) separate the parts of the course 
relating to system use, system description, and the role of the adviser more clearly 
than heretofore; (4) increase the emphasis on, and practice of, the role of the adviser. 

The order of presentation of topics has been shifted so that the roles of the 
adviser are now discussed midway in the program immediately after covering the more 
general and elementary features of system use. At that midpoint, a practice advisory 
session at an active console in the open library environment was incorporated into the 
program. That session includes observation of a trained Intrex adviser plus a demon- 
stration of the system to at least one user by the trainee. From this session, the 
trainee is expected to gain an increased sensitivity to the duties and problems of the 
adviser and an appreciation of the need to further understand some of the system de- 
tails. Discussion of the detailed system structure, previously scattered throughout 
the program, has now been relegated to the latter part of the program. After the prac- 
tical session, the trainee can also function as an assistant adviser when queueing prob- 
lems develop in the console area. 

The total program has been compressed into 18 1 / Z two -hour daily sessions, 
whereas the program last winter had been scheduled for 25 two-hour sessions. Other 
beneficial features of the previous program have been retained including the participa- 
tion of previously trained advisers as console instructors, and testing of the trainees 
at the conclusion of each unit. A new outline of the instructional units is given below. 



Outline of Instructional Units for 
Librarian-Training Program 



KEY: L - Lecture or Dis cus s ion Sess ion C - Cons ole S ess ion 

LC - Lecture plus Console Session (about one hour for each part) 



A. 


Or 


(L) 


1 . 


(C) 


2. 


B. 


ilX: 


(LC) 


3. 


(LC) 


4. 


(LC) 


5 . 


(LC) 


6 . 


(LC) 


7. 


(C) 


8. 


C. 


Ro 


(L) 


9. 


(LC) 


10. 


(C) 


11. 


D. 


Syi 


(LC) 


12. 


(Li) 


13. 



Role of the Adviser; Demonstration 

Basic Steps in Intrex Use: Search; Ca 
Text Output; Typing Errors 



Simple Subject Searches 

C ombining Subject Searches : NAME, AND, OR, NOT, WITH Commands 

Other Primary Searches : TITLE, AUTHOR, DOCUMENT 
Commands; Combining Primary Search Commands 

Outputting: OUTPUT Command; Off-line Output; Functional Catalog 
Field Groups; Ps eudo-Catalog F ields ; Text Output; Library Micro- 
reproduction Facility 

7. Search of Uninverted Fields: RESTRICT Command and ” Eyeballing M 

Miscellaneous Intrex C ommands : LONG, SHORT, TIME ON, 

TIME OFF, COMMENT Commands 

Roles of the Adviser (3 sessions) 

The Adviser’s Job: Instructing, Aiding and Observing Users; 

Recording Information 



INFO, LOG, BEGIN, HOLD, EXIT, QUIT Commands; Layout of 
Console Areas; Different Terminals and their Use; Printed 
Instructional Materials and Aids; Trouble Shooting 



one demonstration 



Structure; Catalog Fields 
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(2L) 


14. 


Data Organization within the Computer: Inverted File and Catalog 
Record Structures; Stemming and Phrase Decomposition; CTSS; 
Computers, Codes, Overlays; Message Handling 


(LC) 


15. 


List Handling: USE, USE ADD, SAVE FILE, SAVE, DROP, 
LIST Commands 


(L) 


16. 


Text Access Facilities 


(L) 


17. 


Strategies: Roles of Thesauri, Inverted File Listings, Booleans 
and Synonyms; Reference Interview 


(L) 


18. 


Summary and Review Discussion 



INSTRUCTIONAL AIDS 

The Intrex Retrieval System offers users a broad variety of ways to learn 
system use. Originally, there were three such ways: A printed Reference Guide that 
fully describes system capabilities and their use; an online, computer -stored version 
of the Reference Guide that can be called for selectively by a user through the INFO 
command and a collection of system messages that indicate, to the user, the results 
of his previous request and what options are open to him at any given point in the 
interaction. Later, we augmented these facilities by making an Intrex adviser avail- 
able to assist users. 

During this reporting period six new user aids have been developed to fill 
gaps that users have noticed and to provide a broader spectrum of aids to meet the 
spectrum of user needs. These new aids are the following: 

1 . The User's Guide written by Professor L.S. Bryant of the M. I. T. 
Humanities Department, offers a more pedagogical introduction to the system than does 
the more encyclopedic Reference Guide. Information in this Guide is presented in a 
leisurely, discursive manner with many examples and hints. 

2. The Summary Guide, a four-page pamphlet, provides a brief introduc- 
tion for the user who prefers a document that seems less physically imposing than the 
Reference Guide whose apparent length has discouraged some users. This Guide is 
primarily intended to meet the needs of the user who is in a hurry to get started in his 
use of the system. 

3. "Project Intrex - A Brief Description" was written by Professor 

C. F. J. Overhage. It provides a description of the Intrex Project with special em- 
phasis on the Intrex retrieval system. This document describes system implementa- 
tion and is thoroughly illustrated. It is intended to meet the needs of the user who is 
more interested in learning about the system itself than in actually obtaining informa- 
tion from the system's document collection. 
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4. "Project Intrex - Samples of Catalog Interactions" was also prepared by 
Professor Overhage. It consists of the teletypewriter records of two sample dialogs 
together with explanatory notes. This aid is intended to serve the user who finds it 
easier to learn from an example than by following a detailed explanation. 

5. A sound -slide introduction to Intrex, described in greater detail in the 
Model Library Section, was developed to provide users with a simple, easily under- 
stood, overall introduction to the use of the Intrex system. 

6. A capability that permits a member of the Intrex Staff to serve as an 
adviser from a remote location, via a second console that is slaved to the user's con- 
sole, was also developed (see the Software section). This facility has been used to 
assist a user at Harvard, from an M.I.T. console, and appears to hold considerable 
pr omis e . 



A major revision of the Reference Guide has been prepared to correct known 
errors and to bring the Guide fully up to date. This version is in press. 

Minor revisions have been made to the online dialog and the responses to the 
INFO command. Major revisions of this dialog and the computer -stored version of the 
Reference Guide are planned to update these features in line with the revised printed 
Reference Guide. 

Improvements in the instructional aids also include more functional rearrange- 
ments of the catalog field descriptions and the subject areas covered by Intrex. These 
improvements are described further in Section D on Inputting. 

SUBJECT/TITLE INVERTED-FILE CHARACTERISTICS 

Characteristics of the inverted files are important in analyzing retrieval 
effectiveness and other properties of Intrex. Statistical characteristics of the Intrex 
subject/title inverted file have been studied by Dr. Syunsuke Uemura, a visiting 
researcher from the Electrotechnical Laboratory of Tokyo, Japan. The details of the 
study, which are given in a separate report, are summarized here. Statistics are 
reported primarily for the May 1971 version of the s ubject/title inverted file, although 
several earlier versions were examined in order to study changes in some character- 
istics as a function of file growth. The gross characteristics of that file, which covers 
the combined, non-common subject index and title words from 15,845 document catalog 
records, include 35,645 word types, 27,291 stem types, and 1,204,282 stem occurrences. 

A document is assigned an average of 1 0 index terms (including the title as a 
term) with 7.6 words per term (excluding 13 common words on a stop list). Thus there 
are an average of 76 word tokens in the complete in verted -file -index set per document. 
These 76 word tokens represent an average of 39-1 unique stem types, so that each 
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stem type occurs on the average of 76/39*1 = 1.95 times per document. The average 
length of a stem type is 7.8 characters. 

A stem type has associated with it in its inverted -file list an average of 1.3 
full (unstemmed) word types. Only 16 percent of the stem types have two or more 
word types but these, of course, include most of the commonly used search words. 
For example, of 31 search words used by students on the "irradiation embrittlement" 
topic in the Class Experiment (see below), 29 had stems with multiple word types in 
the inverted file, one had only a single type and one was not in the inverted file at all. 

The average frequency of a stem type (its number of occurrences) is 44.1. 
Almost half (13,085 or 48 percent) of the total number of stem types occur only once. 
At the other end of the scale, 0.95 percent or 260 stem types, occur more than 1000 
times and this group of high-frequency stems accounts for 54 percent of all stem 
occurrences. The most frequent stem is "magnet-" with 16,824 occurrence. 

The average number of documents referenced per stem type is 22.6. This 
gives the same Redundancy Ratio defined as the number of index words per docu- 

ment divided by the number of stem types per document, which equals, in this case, 
44.1/22.6 or 1.95 — as calculated above. • This ratio was 2.23 for the first 5500 
documents indexed (which had a total index of 101.5 words with 45.4 stem types per 
document). The Redundancy Ratio was 1.75 for the latest 10,000 documents included 
in the file (which set had 62.7 index words and 35.8 stem types per document). An 
intentional change in the indexing process to reduce redundancy, which was described 
in the 15 September 1968 Semiannual Activity Report, brought about this decrease. It 
is important to consider these variations when measuring the retrieval effectiveness 
of Intrex as a function of depth of indexing as is done in the Section that reports on the 
analysis of the Class Experiment. 

A ranking of the 260 high-frequency stems of several stages of inverted - 
file growth shows little or no change in the rank ordering. High-frequency stems 
were analyzed in terms of their redundancy on the basis of document references, using 
another redundancy ratio, defined for each stem as frequency per number of distinct 
documents. Synonyms were found to have similar redundancy-ratio values, whereas 
they do not necessarily rank closely when ordered by occurrences only. Less-tech- 
nical words in this high-frequency set have a lower stem redundancy ratio than do 
more-technical words. 

During the period 1 9 6 8 to 1971 when the document collection grew by a 
factor of 17 (from 955 to 15,845), the number of stem types increased only by a factor 
of 4 (from 6,700 to 27,291). As shown in Fig. IIB-1, the number of stem types added 
to the inverted file during that period decreased from 6.7 to 0.85 per document. 

These new stem types are quite indicative of the stems with extremely low frequency 




Fig. I IB-1 Number of New Stem Types per Document as a 
Function of Document Collection Size 



and were analyzed in some detail as follows. All 309 new s ingle -oc cur renc e stems in 
a recent update batch of 482 documents with 3,254 stem types, were categorized. Some 
32 percent of the new stem types were derived from technical words (21 percent were 
chemical formulae, 2 percent were chemical or material names, and 9 percent were 
other technical words), 24 percent were either symbols or numerics, 26 percent were 
proper nouns, 4 percent were general words, 11 percent were misspellings, and 5 
percent were words that already appeared in the inverted file but were now being taken 
as new words because of untrimmed punctuation. Personal names accounted for more 
than 80 percent of the proper-noun category. It has been noted that many of the new 
chemical or material names and other technical words are compound words formed 
with pr efixes . 



The percentage distribution of stem occurrences by the range number 
assigned to the subj ect -ind ex phrases in which they occur is: 



Range 0 


(Generic term) 


Range 1 


(Major subject) 


Range 2 


(Secondary subject) 


Range 3 


(Minor subject) 


Range 4 


(Tool or technique) 


Range 5 


(Title) 



2.8 % 
14.7 % 
33. 1 % 
28. 9 % 
11.1 % 
9.3 % 
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